| country | n | |
|---|---|---|
| 1 | United States | 3297 |
| 2 | India | 990 |
| 3 | United Kingdom | 723 |
| 5 | Canada | 412 |
| 6 | France | 349 |
| 7 | Japan | 287 |
| 8 | Spain | 215 |
| 9 | South Korea | 212 |
| 10 | Germany | 199 |
| 11 | Mexico | 154 |
| 12 | China | 147 |
| 13 | Australia | 144 |
| 14 | Egypt | 110 |
| 15 | Turkey | 108 |
| 16 | Hong Kong | 102 |
| 17 | Italy | 90 |
| 18 | Brazil | 88 |
| 19 | Belgium | 85 |
| 20 | Taiwan | 85 |
| 21 | Argentina | 82 |
| 22 | Indonesia | 80 |
| 23 | Philippines | 78 |
| 24 | Nigeria | 76 |
| 25 | Thailand | 65 |
| 26 | South Africa | 54 |
| 27 | Colombia | 45 |
| 28 | Netherlands | 45 |
| 29 | Denmark | 44 |
| 30 | Ireland | 40 |
| 31 | Singapore | 39 |
| 32 | Sweden | 39 |
| 33 | Poland | 36 |
| 34 | United Arab Emirates | 34 |
| 35 | Norway | 29 |
| 36 | New Zealand | 28 |
| 37 | Russia | 27 |
| 38 | Chile | 26 |
| 39 | Israel | 26 |
| 40 | Lebanon | 26 |
| 41 | Malaysia | 26 |
| 42 | Pakistan | 24 |
| 43 | Czech Republic | 20 |
| 44 | Switzerland | 17 |
| 45 | Uruguay | 14 |
| 46 | Romania | 12 |
| 47 | Austria | 11 |
| 48 | Finland | 11 |
| 49 | Luxembourg | 11 |
| 50 | Greece | 10 |
| 51 | Peru | 10 |
| 52 | Saudi Arabia | 10 |
| 53 | Bulgaria | 9 |
| 54 | Hungary | 9 |
| 55 | Iceland | 9 |
| 56 | Jordan | 8 |
| 57 | Kuwait | 7 |
| 58 | Qatar | 7 |
| 59 | Serbia | 7 |
| 60 | Morocco | 6 |
| 61 | Cambodia | 5 |
| 62 | Kenya | 5 |
| 63 | Vietnam | 5 |
| 64 | West Germany | 5 |
| 65 | Croatia | 4 |
| 66 | Ghana | 4 |
| 67 | Iran | 4 |
| 68 | Portugal | 4 |
| 69 | Bangladesh | 3 |
| 70 | Malta | 3 |
| 71 | Senegal | 3 |
| 72 | Slovenia | 3 |
| 73 | Soviet Union | 3 |
| 74 | Ukraine | 3 |
| 75 | Venezuela | 3 |
| 76 | Zimbabwe | 3 |
| 77 | Algeria | 2 |
| 78 | Cayman Islands | 2 |
| 79 | Georgia | 2 |
| 80 | Guatemala | 2 |
| 81 | Iraq | 2 |
| 82 | Namibia | 2 |
| 83 | Nepal | 2 |
| 84 | Afghanistan | 1 |
| 85 | Albania | 1 |
| 86 | Angola | 1 |
| 87 | Armenia | 1 |
| 88 | Azerbaijan | 1 |
| 89 | Bahamas | 1 |
| 90 | Belarus | 1 |
| 91 | Bermuda | 1 |
| 92 | Botswana | 1 |
| 93 | Cuba | 1 |
| 94 | Cyprus | 1 |
| 95 | Dominican Republic | 1 |
| 96 | East Germany | 1 |
| 97 | Ecuador | 1 |
| 98 | Jamaica | 1 |
| 99 | Kazakhstan | 1 |
| 100 | Latvia | 1 |
| 101 | Liechtenstein | 1 |
| 102 | Lithuania | 1 |
| 103 | Malawi | 1 |
| 104 | Mauritius | 1 |
| 105 | Mongolia | 1 |
| 106 | Montenegro | 1 |
| 107 | Nicaragua | 1 |
| 108 | Panama | 1 |
| 109 | Paraguay | 1 |
| 110 | Puerto Rico | 1 |
| 111 | Samoa | 1 |
| 112 | Slovakia | 1 |
| 113 | Somalia | 1 |
| 114 | Sri Lanka | 1 |
| 115 | Sudan | 1 |
| 116 | Syria | 1 |
| 117 | Uganda | 1 |
| 118 | Vatican City | 1 |
| country | listed_in | n |
|---|---|---|
| India | International Movies | 811 |
| India | Dramas | 611 |
| United States | Dramas | 584 |
| United States | Comedies | 525 |
| United States | Documentaries | 421 |
| United States | Independent Movies | 310 |
| United States | Children & Family Movies | 303 |
| India | Comedies | 300 |
| United States | Action & Adventure | 245 |
| United States | TV Comedies | 228 |
| United States | Stand-Up Comedy | 212 |
| United States | Thrillers | 205 |
| United States | TV Dramas | 196 |
| United States | Kids’ TV | 170 |
| United States | Docuseries | 169 |
| United States | Romantic Movies | 166 |
| United States | Horror Movies | 152 |
| India | Independent Movies | 140 |
| India | Action & Adventure | 127 |
| United States | Sci-Fi & Fantasy | 127 |
| United States | Crime TV Shows | 114 |
| India | Romantic Movies | 113 |
| United States | Music & Musicals | 109 |
| United States | Reality TV | 109 |
| India | Music & Musicals | 94 |
| United States | Sports Movies | 94 |
| India | Thrillers | 88 |
| United States | TV Action & Adventure | 77 |
| India | International TV Shows | 60 |
| United States | Classic Movies | 59 |
| United States | LGBTQ Movies | 54 |
| United States | TV Sci-Fi & Fantasy | 52 |
| United States | TV Mysteries | 42 |
| United States | Science & Nature TV | 41 |
| United States | International Movies | 38 |
| United States | International TV Shows | 38 |
| United States | Cult Movies | 37 |
| United States | Romantic TV Shows | 35 |
| India | Horror Movies | 33 |
| United States | Stand-Up Comedy & Talk Shows | 33 |
| United States | Faith & Spirituality | 30 |
| United States | TV Horror | 30 |
| United States | Teen TV Shows | 29 |
| India | TV Comedies | 25 |
| India | TV Dramas | 25 |
| United States | Movies | 22 |
| United States | TV Thrillers | 22 |
| India | Children & Family Movies | 19 |
| India | Documentaries | 19 |
| United States | Classic & Cult TV | 16 |
| United States | Spanish-Language TV Shows | 16 |
| India | Sports Movies | 15 |
| India | Classic Movies | 11 |
| India | Kids’ TV | 11 |
| United States | Anime Series | 11 |
| India | Sci-Fi & Fantasy | 10 |
| India | Crime TV Shows | 9 |
| India | Romantic TV Shows | 9 |
| United States | British TV Shows | 9 |
| India | Docuseries | 7 |
| India | TV Horror | 7 |
| India | Stand-Up Comedy | 6 |
| India | Cult Movies | 5 |
| India | TV Action & Adventure | 5 |
| India | Faith & Spirituality | 3 |
| India | Reality TV | 3 |
| India | Stand-Up Comedy & Talk Shows | 3 |
| India | TV Mysteries | 3 |
| India | TV Sci-Fi & Fantasy | 3 |
| India | TV Thrillers | 3 |
| United States | TV Shows | 3 |
| India | LGBTQ Movies | 2 |
| India | TV Shows | 2 |
| United States | Anime Features | 2 |
| United States | Korean TV Shows | 2 |
| India | British TV Shows | 1 |
| India | Teen TV Shows | 1 |
The data frame is sorted from countries that produce the most to least Movies and TV Shows that are available on Netflix since 2019
3297 Movies and TV shows available on Netflix were produced in the United States
990 Movies and TV Shows available on Netflix were produced in India
What kind on questions can we answer using this dataset/package?
How does duration of movies differ in these countries?
How do the Movies and TV Shows that are produced in these countries differ?
What genres are popular in the United States? What genres are popular in India?
What is the maturity rating distribution like for these two countries?
Drama seems to be a popular genre for both the United States and India. How do these two countries describe films under the genre category?
This plot shows a boxplot distribution of duration of Movies produced in India and the United States regardless of genre
The majority of Indian produced movies are Bollywood produced movies and it is known that bollywood films are significantly longer than US produced films.
The median length for Indian produced movies that are availble on netflix is over 2 hours long while US based movies are around 1.5 hours (1 hour 30 minutes)
Omitting the International Movies genre, since movies/tvshows produced in India typically end up in this category due to location, both the United States and India produce a significant amount of Movies under the drama category
There are 584 movies produced in the United States that are labeled under Drama
There are 611 movie s produced in India that are labeled under Drama
From my understanding, and from the movies I’ve watched, India typically produces movies that are more family friendly so it makes sense that the most popular rating is TV-14
United States typically produce more movies and tv shows that are more mature however there is a wider variety on content type when comparing it to India. This may be because Netflix was founded in the US which makes the avaiale content steer more towards US-based produced content
---
title: "Analyzing Netflix Content Produced in USA and India"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bg: "#ffffff"
fg: "#101010"
orientation: columns
storyboard: true
social: menu
source: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(NetflixData)
library(tidyverse)
library(ggplot2)
library(ggthemes)
library(plotly)
library(kableExtra)
library(SnowballC)
library(RColorBrewer)
library(dplyr)
library(e1071)
library(mlbench)
library(bslib)
library(ggridges)
#Text mining packages
library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)
library(thematic)
```
### United States and India are the two countries that produce the most content on Netflix
```{r}
c <- country_data %>%
count(country) %>%
arrange(desc(n)) %>%
na.omit()
c %>%
kbl(caption = "Count Of Content By Country") %>%
kable_material_dark("striped") %>%
scroll_box(width = "825px", height = "250px")
```
```{r}
c2 <- country_listedin %>%
group_by(country) %>%
filter(country %in% c("United States", "India")) %>%
count(listed_in) %>%
arrange(desc(n)) %>%
na.omit()
c2 %>%
kbl(caption = "Count Of Genre Content Produced In US & India") %>%
kable_material_dark("striped") %>%
scroll_box(width = "825px", height = "250px")
```
***
- The data frame is sorted from countries that produce the *most to least* Movies and TV Shows that are available on **Netflix** *since 2019*
- **3297** Movies and TV shows available on Netflix were produced in the **United States**
- **990** Movies and TV Shows available on Netflix were produced in **India**
- What kind on questions can we answer using this dataset/package?
- How does duration of movies differ in these countries?
- How do the Movies and TV Shows that are produced in these countries differ?
- What genres are popular in the United States? What genres are popular in India?
- What is the maturity rating distribution like for these two countries?
- Drama seems to be a popular genre for both the United States and India. How do these two countries describe films under the genre category?
### Duration Distribution of USA and Indian Produced Movies (all genres)
```{r}
india_duration <- dplyr::filter(netflix, grepl('India', country))
india_duration <- dplyr::filter(india_duration, grepl('Movie', type))
usa_duration <- dplyr::filter(netflix, grepl('United States', country))
usa_duration <- dplyr::filter(usa_duration, grepl('Movie', type))
fig_dur <- plot_ly(y = ~india_duration$duration, color = I("red"), type = "box", name = "India")
fig_dur <- fig_dur %>% add_trace(y = ~usa_duration$duration, color = I("black"), name = "USA")
fig_dur <- fig_dur %>%
layout(title = "India vs USA duration of Movie",
yaxis = list(title = "Duration In Minutes"))
fig_dur
```
***
- This plot shows a boxplot distribution of duration of Movies produced in India and the United States *regardless* of genre
- The majority of Indian produced movies are Bollywood produced movies and it is known that bollywood films are significantly longer than US produced films.
- The median length for Indian produced movies that are availble on netflix is over 2 hours long while US based movies are around 1.5 hours (1 hour 30 minutes)
- **India:** The inner quartile ranges from *1 hour and 43 minutes* to *2 hours and 21 minutes*
- **USA:** The inner quartile ranges from *1 hour and 21 minutes* to *1 hours and 45 minutes*
### USA and India: taking a closer look at genre
```{r}
usa_listedin <- country_listedin %>%
filter(country == "United States") %>%
count(listed_in) %>%
arrange(listed_in) %>%
na.omit()
india_listedin <- country_listedin %>%
filter(country == "India") %>%
count(listed_in) %>%
arrange(listed_in) %>%
na.omit()
fig <- plot_ly(india_listedin, x = ~listed_in, y = ~n, type = 'bar', name = 'India', color = I("red"))
fig <- fig %>% add_trace(usa_listedin, x = usa_listedin$listed_in, y = usa_listedin$n , name = 'USA', color = I("black"))
fig <- fig %>% layout(xaxis = list(title = 'Genre'),yaxis = list(title = 'Count'), barmode = 'stack')
fig
```
***
- Omitting the *International Movies* genre, since movies/tvshows produced in India typically end up in this category due to location, both the United States and India produce a significant amount of Movies under the drama category
- There are **584 movies** produced in the United States that are labeled under **Drama**
- There are **611 movie** s produced in India that are labeled under **Drama**
### USA and India: Taking a closer look at maturity rating proportion distribution (regardless of genre)
```{r}
usa_countrydata <- country_data %>%
filter(country == "United States") %>%
group_by(country) %>%
count(rating) %>%
arrange(rating) %>%
na.omit()
india_ratingdata <- country_data %>%
filter(country == "India") %>%
group_by(country) %>%
count(rating) %>%
arrange(rating) %>%
na.omit()
india_rating_plot <- plot_ly(india_ratingdata, x = ~rating, y = ~n, type = 'bar', name = 'India', color = I("red"))
usa_rating_plot <- plot_ly(usa_countrydata, x = ~rating, y = ~n , name = 'USA', color = I("black"))
subplot(india_rating_plot, usa_rating_plot)
```
***
- From my understanding, and from the movies I've watched, India typically produces movies that are more family friendly so it makes sense that the most popular rating is TV-14
- United States typically produce more movies and tv shows that are more mature however there is a wider variety on content type when comparing it to India. This may be because Netflix was founded in the US which makes the avaiale content steer more towards US-based produced content
### Most Frequent Description Words for Dramas Produced In India
```{r}
india_wordcloud_data <- dplyr::filter(netflix, grepl('Dramas', listed_in))
india_wordcloud_data <- dplyr::filter(india_wordcloud_data, grepl('India', country))
corpus = Corpus(VectorSource(india_wordcloud_data$description))
corpus = tm_map(corpus, PlainTextDocument)
corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, removePunctuation)
corpus = tm_map(corpus, removeWords, c("cloth", stopwords("english")))
corpus = tm_map(corpus, stripWhitespace)
dtm <- TermDocumentMatrix(corpus)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
india_data <- data.frame(word = names(v),freq=v)
usa_wordcloud_data <- dplyr::filter(netflix, grepl('Dramas', listed_in))
usa_wordcloud_data <- dplyr::filter(usa_wordcloud_data, grepl('United States', country))
corpus2 = Corpus(VectorSource(usa_wordcloud_data$description))
corpus2 = tm_map(corpus2, PlainTextDocument)
corpus2 = tm_map(corpus2, tolower)
corpus2 = tm_map(corpus2, removePunctuation)
corpus2 = tm_map(corpus2, removeWords, c("cloth", stopwords("english")))
corpus2 = tm_map(corpus2, stripWhitespace)
dtm2 <- TermDocumentMatrix(corpus2)
m2 <- as.matrix(dtm2)
v2 <- sort(rowSums(m2),decreasing=TRUE)
usa_data <- data.frame(word = names(v2),freq=v2)
```
```{r}
set.seed(4)
barplot(india_data[1:10,]$freq, las = 2, names.arg = india_data[1:10,]$word,
col ="red", main ="Most Frequent Desription Words for Dramas In India",
ylab = "Word frequencies")
w <- wordcloud(words = india_data$word, freq = india_data$freq, min.freq = 10,
max.words=25, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(10, "Dark2"))
```
***
- Most movies that are produced in Bollywood typically geer towards romance so I am not surprised that "Love" is a popular description word
### Most Frequent Description Words for Dramas Produced In USA
```{r}
set.seed(4)
barplot(usa_data[1:10,]$freq, las = 2, names.arg = usa_data[1:10,]$word,
col ="black", main ="Most Frequent Description Words for Dramas In USA",
ylab = "Word frequencies")
w2 <- wordcloud(words = usa_data$word, freq = usa_data$freq, min.freq = 10,
max.words=25, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(10, "Dark2"))
```
***
- The word "young" the the number one most popular describer word for both US and Indian dram movies
### Duration Distribution of USA and Indian Produced Drama Movies
```{r}
india_duration_drama <- dplyr::filter(netflix, grepl('India', country))
india_duration_drama <- dplyr::filter(india_duration_drama, grepl('Dramas', listed_in))
india_duration_drama <- dplyr::filter(india_duration_drama, grepl('Movie', type))
usa_duration_drama <- dplyr::filter(netflix, grepl('United States', country))
usa_duration_drama <- dplyr::filter(usa_duration_drama, grepl('Dramas', listed_in))
usa_duration_drama <- dplyr::filter(usa_duration_drama, grepl('Movie', type))
fig_dur_drama <- plot_ly(y = ~india_duration_drama$duration, color = I("red"), type = "box", name = "India")
fig_dur_drama <- fig_dur_drama %>% add_trace(y = ~usa_duration_drama$duration, color = I("black"), name = "USA")
fig_dur_drama <- fig_dur_drama %>%
layout(title = "India VS USA Duration Of Drama Movies",
yaxis = list(title = "Duration In Minutes"))
fig_dur_drama
```
***
- I went back and did another boxplot of duration of India and USA produced movies but this time under the Drama category. I found that the distributions were similar
### Duration Across Genres
```{r}
india_duration2 <- dplyr::filter(netflix, grepl('India', country))
india_duration2 <- dplyr::filter(india_duration2, grepl('Movie', type))
indiadensity <- ggplot(india_duration2,
aes(x = duration,
y = rating,
fill = rating)) +
geom_density_ridges() +
theme_few() +
labs(title = "India Movie Duration Distribution", x = "Duration in Minutes", y = "Maturity Rating") +
theme(legend.position = "none")
usa_duration2 <- dplyr::filter(netflix, grepl('United States', country))
usa_duration2 <- dplyr::filter(usa_duration2, grepl('Movie', type))
usadensity <- ggplot(usa_duration2,
aes(x = duration,
y = rating,
fill = rating)) +
geom_density_ridges() +
theme_few() +
labs(title = "United States Movie Duration Distribution", x = "Duration in Minutes", y = "Maturity Rating") +
theme(legend.position = "none")
require(gridExtra)
gridExtra::grid.arrange(indiadensity, usadensity)
```
***
- I went and did a quick density plot using geomridges